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Abstract — The complexity of a CSMA algorithm has been 
translated to the norm properties of a dependencies matrix. The 
maximum throughput optimization is reformulated by including 
the dependencies matrix in the formulations. It has been shown 
that for the interference graphs Q that have minimum vertex 
cover size C{G) — logn where n is the number of the links, the 
optimal strategy of the links is to transmit with the probability 
1, i.e a service-rate agnostic approach. 

Several numerical analyses have been conducted in order 
to illustrate the effect of the interference graph, transmission 
strategy and arrival rate on the dependencies matrix. 

I. INTRODUCTION 

“Complexity”, once an ordinary noun describing objects 
with many interconnected parts, now designates a specific field 
with so many branches. In this paper a system is considered 
as complex when it shows emergence properties. Emergence 
in this case refers to a situation where the aggregate of 
interactions exhibit properties not attained by summation (the 
whole is more than the sum of its parts). From a design 
perspective, complex systems should be decomposed into 
weakly interacting subsystems to avoid such properties. The 
focus of this paper is on the complexity of scheduling in 
communication networks. 

The idea of layering for complexity decomposition has been 
applied previously to the communication network protocols 
[1]. Although the layering techniques have provided a very 
efficient platform for communication networks, the arrival 
of cognition in modern radios has increased the complex¬ 
ity. These cognitive abilities shift the underlying models of 
communication system from complex physical systems to 
complex adaptive systems. This is because of the ability of 
cognitive nodes to interact with each other in a distributed way, 
where each node not only learns from the radio environment 
but also interacts with other nodes. The idea of decomposition 
is a good solution to the situations when there is some coupling 
or interaction between networking problems. The general idea 
of decomposition is to break the problem into smaller ones and 
solving each of the smaller ones in a distributive manner [2]. 
In this work a resource scheduling situation is described where 
distributed optimization is not efficient due to the emergence 
properties of the system. This is because the optimization 
of the whole system is more than the sum of its distributed 
optimization parts. 

The focus of this paper is on carrier sense multiple access 
(CSMA) scenarios. This is due to its connection to the Markov 
chain system as the few mathematically describable models 


for the study of the complex systems. We show that if the 
scheduling parameters in the CSMA scheduling exceed a 
specific threshold, the local observations of the links may 
not be effective for a distributed learning mechanism. This 
is because the local observation of different links get tied up 
together in a level that the distant link parameters should be 
considered to achieve the required efficiency. The question we 
address here is how arrival rate, interference graph and the 
simple gradient methods for adjusting the transmission rates, 
affect the complexity of maximum throughput optimization? 
We will answer this question by introducing a dependencies 
matrix into the maximal throughput optimization in the CSMA 
scheduling. Beside studying the complexity of the CSMA, a 
direct result of our work is to prove that when the minimum 
vertex cover of the interference graph is logarithmic in terms of 
the size of vertexes O(logn), the suitable strategy for solving 
the optimization problem is to transmit with probability 1, i.e 
a service-rate agnostic approach. 

Related works It is known that the problem of maximum 
throughput in a CSMA scheduling is the problem of finding the 
maximum independent set of the wireless interference graph. 
Using this intuition a Glauber dynamic Q has been applied to 
the CSMA problem known as PGD-CMSA (Parallel Glauber 
Dynamic CSMA) in [3]. We consider a non-parallel version of 
that work (GD-CSMA) for the ease of modelling. It is proved 
in [3] that there is no complexity emergence (low mixing time 
in their context) for complete graphs but we show this is also 
true for the graphs with the minimum vertex cover size of 
logn. 

Our problem formulation can be bridged to the design 
problem of low delay maximal throughput CSMA scenarios 

[4] . Then the results of our paper can be applied automatically 
to this sets of problems as well. Our work differs from [4] 
in its optimization formulation. Moreover the focus of this 
paper is to address the complexity decomposition of distributed 
learning rather than the low delay scheduling algorithms. 

The Markov chain of our studied CSMA is also similar to 

[5] . Using state decomposition, the authors of [5] provide a 
constraint on the size of the independent sets of the graph that 
can guarantee the fast mixing condition of Markov chain. In 
our work the fast mixing condition is part of the throughput op¬ 
timization problem, a formulation that has not been addressed 
to the best of our knowledge. 

* Glauber dynamic is a Markov chain monte carlo method that can be used 
to sample the independent set of a graph according to a product distribution 
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In order to prove Theorem 5, the gradient descent algorithm 
similar to [6-7] is used. 

II. Problem formulation 

Consider an wireless interference graph G = {E, V) with 
set of V, |I7| = n nodes as links and a set of E edges. There 
is an edge between nodes Vi and Vj if they cannot transmit 
simultaneously. Let’s show a feasible schedule X by a vector 
of the form with Xi G {0,1} for all i G V. A link 

i is included in the schedule X if Xi = 1. X is a feasible 
schedule if + Xj < 1, V(i,j) G E that is an independent 
set of interference graph G. Let G {0,be the set of 
all feasible schedules or independent sets of G. Assume the 
GD-CSMA scheduling algorithm to be as the following: 

For time t, 

• Phase 1: Select a link i uniformly at random. 

. Phase 2: if - 1) = 0 

(a) Xi (t) = 1 with probability Ui = 

(b) Xi {t) = 0 with probability 1 — Ui = 

Else: 

x^{t) = 0. 

For every link j ^ i'. 

Xj{t) =Xj{t-l). 

where we call Ui the transmission strategy and Ai > 0 the 
fugacity parameter. 

Let the packet arrival distribution of links follows an i.i.d 
Bernouli distribution with the expected arrival vector of the 
1 / = {i'i),\/i. Also let define the capacity region of the network 
as: 

A = {u> 0|3/r G Co(n),z 2 < /x} (1) 

where Co(n) is the convex hull of the set of feasible schedules, 
i.e, fjb G Co(f2) if /X = X^xen where tx = 1 and 
ix > 0 can be viewed as the fraction of time that schedule X 
is used. 

The following theorem and the optimization formulation are 
the direct results of [3]. 

Theorem 1 [3]: The dynamics of the GD-CSMA results 
to that of a Markov chain with the following product-form 
stationary distribution: 

„f\r\ _ rii^X 

- V—n—r 

Z^x'en lliex' 


Optimization formulation: 

Let denote := log(Ai). Then given any u G A, the ser¬ 
vice rates of the GD-CSMA can exactly meet the arrival 
rates of all links when the vector r* = (r*),Vz is the so¬ 
lution of the convex optimization problem maxr F(r; zz) 

= X! 

i xen i (3) 

S.t Ti > 0, Vx 


A. A simple distributed optimization algorithm 

Taking the partial derivative from Q with the substitution 
for the mean service rate of link i Si := X^x-a; =i ^(X) yields: 

dF{r, v>)/dri = Vi - Si(r) (4) 

Using (|4|l a simple gradient algorithm of (|5]l can be suggested. 
© can be perceived as a distributed algorithm since link 
parameters can be adjusted based on the local informations 
of arrival rate Vi{t) and service rate Sj(r(f)) as the average 
arrival rate and service rate between time t and f -f 1. 

ri{t + l) = [n{t) + a{o-{t) - s[{r{t)))]+ (5) 

1) Complexity of the distributed optimization: The previous 
distributed approach is feasible only when the average service 
rate sfr{t)) perceived by the link i can track the stationary 
distribution of the CSMA Markov chain Si(r) fast enough. 
Let’s say it is fast enough when for every link i, © is bounded 
above by some polynomial function 0(poly(n)): 

, t+T 

k—t 

by remembering that Si = ® written 

as: 

I^E( E E ( 7 ) 

k—1 —1 X:xi — 1 

where /ix(t),fe(X) is the distribution of the Markov chain of 
the schedules after k slots if the Markov chain starts with 
X(f). The expression © can be understood as the mixing 
time of the Markov chain ^ XlLi llMx(t),fc ~ 7r||var known to 
be bounded below by a exponentially large function in the 
numbers of links n for some range of parameter r [8]. This 
means there exits transmission strategy U — {Ui),\/i that the 
average service rate cannot follow the stationary distribution 
fast enough. Therefore selecting the parameter as the 
reference to update the optimization strategy is ineffective. In 
other words the individual optimization solutions are coupled 
with the optimization of other links to a level that the problem 
cannot be solved distributively. 

Therefore our aim is to include the fast mixing condition in 
©. The following section introduces the fast mixing condition 
as a new constraint in the previous optimization set ups. 

In [5] using a state decomposition technique it is shown that 
if the probability of going to states X corresponding with the 
independent sets of size more than l 2 {a-i) ^ being 

the maximum degree of the interference graph is rare and n 
being the number of nodes (links) then the Markov chain is 
fast mixing regardless of the Glauber dynamic parameters. 

In the following section instead we address the optimization 
problem of maximum throughput under the constraint of fast 
mixing. 

III. Dependencies matrix as a new constraint of 

THE OPTIMIZATION PROBLEM 
A. Preliminaries 

If p and o are two probability distributions on U, then the 
total variation distance between p and v is: 







3 


drv ■■= max \fi{A) - v{A )I = ^ XI - ^ix)\ 


( 8 ) 




where Y G 17 is the same as state X in all links except the 
link j. The rest of parameters are the same as Denote the 
matrix of these parameters with TZ^. Let’s define the expected 
dependencies matrix I as: 


A coupling between two probability distributions /r and i/ 
is a pair of random variables {X,Y) such that 

• {X, Y) are defined on a common probability space. 

• X has distribution p, and 
> Y has distribution v 

Proposition 1 If /r and v are two probability distributions, 
then 


T = Y,p{X)n^ 


(14) 


It is easy to see that the convex combination of probability 
state transitions and the dependencies matrix keeps the sum 
of every row i of matrix X bounded above by 1 as well. That 
is: 


= min V{X ^ Y). 

(X,y )couplings 


(9) 


Y,dfp{X) < l,V^e Y 


(15) 


Example. Let fl = {0,1} and set /ip(0) = 1—p and p,q{l) = 
p Then 

dTv{p,v) = i(|(l -p) - (1 -g)| + \p-q\) = \p-q\ (10) 

so the coupling using the uniform variable is optimal. 

Let Sj be all the pairs of configuration (X, Y) G agreeing 
on transmission states of all links except the link j. Then the 
dependencies matrix is defined as TZ := [Rij) where i and j 
are different links and the dependencies of link j on i is: 


where df is the sum of row i in TZ^. Now let’s rewrite O 


as 


Rij = max (iTy(p*(X, .),p*(Y,.)) 
(X,Y)gSj 


( 11 ) 


where pi(X,.) denotes the marginal distribution of the trans¬ 
mission state of link i, for configurations sampled from tt in 
(|2]i conditioned on agreeing with X at all other links. 

Theorem 2: Dobrushin condition The GD-CSMA has the 
fast mixing time Markov chain when every row sum of the 
dependencies matrix TZ is less than 1. 

Proof: Theorem 3 in [10]. ■ 

The Dobrushin condition roughly states that there is asymp¬ 
totically no correlation between the link at a z and the link j 
with distance d from i, as d tends to infinity. [8-9] showed a 
weaker hypothesis for the Dobrushin condition that requires 
any operator norm of TZ to be less than 1. 

In the next section we include the Dobrushin condition as 
a constraint in the dual optimization of (O. 

Theorem 3 The dual problem of (O can be written as (ITSl i 

min X P(X) log(p(X)) 

XGfi 

^p{X)xi > VifTi G V 

X 

XpW = i 


s.t 


= (iTy(Pi(X, .),/ii(Y,.)) 


(13) 


- x^)dfp(X) +J2^idfp(X) <l,VieV (ig) 

X X 

Now using the capacity constraint of (fT^ , 

X (1 - Xi)dfp{X) < 1 - G V ( 17 ) 

X 

where = minx df is the row i sum of the dependencies 

matrix IZ. By including (fTTI t in (fT^ , we end up with the (fT^ : 


min 

X 


in X iog(p(X)) 


xen 
1 


(18) 


( 12 ) 


0 < p(X) < 1 

where p(X) is the probability of state X in the CSMA Markov 
chain. 

Proof: Refer to Appendix A. ■ 

To include the fast mixing condition in (ITSl i let’s define 
an expected dependencies distance matrix I as the following: 
Redefine the dependencies metric (ITTl l as: 


s.t X(1 “ 2 :i)p(X) < -- Vi^i^V 

Xi^w = 1 

X 

0 < p(X) < 1 


Note that for di,min = 1 the problem is the same as ([T2]| . 
The important difference of (IT^ with (ITSl l is that di^niin can 
be written in terms of the dual variable of this optimization 
problem as will be shown later. Now that we have embedded 
the complexity constraint in the dual optimization problem 
we can now bring back the problem to the prime optimization 
format by taking the dual of dual as is formulated in Theorem 
4. It is important to return to the prime optimization formula¬ 
tion as it provides a proper framework to introduce the graph 
characteristics into the complexity of distributed optimizations. 
This is discussed in the Section IV. 

Theorem 4 Dual of the optimization set up (IT^ is in the 
form of: 


maxl?(^; iz) = 


i xen i 

s.t (i > 0,V* 

Proof: Similar to proof of theorem 3. ■ 
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B. A service-rate agnostic case 

In this section we first present our main theorem and the rest 
of the paper till the numerical analyses section is to devoted 
to prove this Theorem. 

Theorem 5 For interference graph with the minimum vertex 
cover size of O(logn), the update strategy of all links is to 
transmit with probability 1, i.e independent of the service rate 

s = (si) Vf . 

Proof: To prove the previous theorem we use a gradient 
based algorithm approach similar to [6-7]. The main idea of 
the following technique is to lower bound the change in the 
dual value by an auxiliary function and then maximize that 
bound. 

For Ci > 0,(5 > -C,i, 


25(G)<50 = 

log(^ exp(^ -(5i(l - a<0)) - 

X i 

( —— ) - ( — ^ — ) = 

(G + <50 2 ,min (G) (20) 

log(^exp(^ ^"^ C6i))+ 

X I 

r I / G + <5i X / G \ 

-om (-- I A V ~ 

Select C = 1 X)i(l ~ ^i) Then by Jensen’s inequality 

X i 

+ ( 21 ) 


X i 


c 


log(l + XIXI - 7 T^(exp(-CG) - 1)) 

X t 

Using (l2n i. the (l20l i can be written as; 

25 (G)- 25 (G+G) < 

log(X X -7T^(exp(-CG) - 1)) 


X i 


( 22 ) 


~r Oi) Q'2,minl,S'i/ 


We use the auxiliary function .4(C, <5) to design the gradient 
based algorithm. 

To continue the proof we use the lemmas 1 and 2. 

Lemma 1 Under the formulation ( [T^ exp((^0 = 

Proof: It is easy to show that the relation between primary 
and dual variables are given by: for every X G U 


p(X) = 


exp(-EiG(l -<g0) 
Exexp(-EiG(l-a;0) 


(23) 


comparing with (|2]l and noting that (1 — Xi) > 0 we can see 
that exp(-C0 = \ or exp(Ci) = j-- ■ 

Lemma 2 di,min can be estimated as where di is the 
graph degree of link i. 

Proof: Let X be any configuration. There are only two 
possibilities for the marginal distribution of tt at a link i, 
conditioned on the neighbors agreeing with X. Either some 


neighbor is occupied under X, in which case i is transmitting 
with probability 0, or all neighbor silent, in which case i 
is transmitting with probability and silent otherwise. 

This means that dp„iin can be estimated as since the 

dependencies of i on j is zero except when i and j are 
neighbors. ■ 

The rest of proof is based on a variational method that is to 
maximize the expectation of the bound Ex{A{C, S)). Using 
linearity of expectation followed by applying Lemma 1 and 2 
and then taking partial derivative of Ex{A{C, <5)) yields: 


(Si 


1) exp(-C5i) - 


dEx{A{C,S)) 

dS, 

(1 + G + G) exp((^i -p 5i) 

di 


(24) 


where Si = is the average service rate. Now if 

each link i updates according to a gradient algorithm of the 
following; 


CX^max(0,C‘ + G*(C‘)) (25) 

where U*(C*) ~ ^ then maxX>(^;i/) is achieved. 

First note that ^ > 0 and convexity implies 

that S* to be found at the corners. To have the gradient 
algorithm converge, the solution is 5* = —Ci- This re¬ 
quires ^ Q rpjjjg condition can be achieved if 

C — O(logn) and — 1 < 0,Vi. This is true for all the 
interference graphs except the complete graph. Since G = 0 
means transmission with probability 1 therefore for complete 
graph there exists a link i that Si — 1 = 0. However for a 
complete graph di = n and the third right hand term of the 
(l24l i will be zero and again the ~ ^ q 

The proof is complete by understanding the graphical 
meaning of C. To this end note the following inequality for 

c = 1 + EGi - vx. 


1 n — max 
X 


X < C 


2 


(26) 


where n and max X] respectively the number of the 

2 

vertices and the maximum independent set of the interference 
graph Q. From graph theory it is known that the number 
of vertices of a graph is equal to its minimum vertex cover 
number plus the size of a maximum independent set (Fig[T]). 
Therefore the right hand side of the inequality (|26] | is the 
minimum vertex cover plus 1 and the minimum vertex cover 
of O(logn) implies C = 1 -I- X)i(l ~ ®t) VX to be O(logn) 
as well. This completes the proof. ■ 


IV. Numerical analysis of the dependencies 

MATRIX TZ 

We have simulated the described CSMA. Under these simu¬ 
lations, the dependencies matrixes TZ for different interference 
graphs of Figl2] are derived. We have simulated the described 
CSMA with the number of the links n = 16. The probability 
distances and the corresponding dependencies matrix TZ are 
achieved by running the the simulations for 10® iterations for 
three different scenarios: 
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(d) Regular Graph k = 8 


(e) Regular Graph A: = 10 


(f) Complete Graph 


Fig. 2. Different graphs Q that are used in the study of dependencies matrix TZ 



Fig. 1. Each node represents a link in Q. If two links cannot transmit 
simultaneously there is an edge between them. A minimum vertex 
cover is the minimum number of nodes that can cover all the edges. 
As it can be seen for star graph this number is 1 while in a complete 
graph it is the same as the numbers of the nodes 5. 


In the first scenario it is assumed that all links use the same 
transmission strategy 0 < U < I and that they have packets 
to transmit all the time. This way the strategy of transmission 
is independent of the arrival rate i/i and service rate Si. The 
results are shown in Figl3] It may be noticed that except the 
star graph, the norm-1 of the other dependencies matrices 
fall below the threshold ||72.||]^ = 1. This may be justified 
by considering the Theorem 5 and examining the minimum 
vertex size of the star graph with C = 1 and the rest of the 
graphs with C > log(n). We have observed some values of 
more than 1 for the complete graph and circular graph but 
this should be due to our small numbers of the links n and 
limited numbers of the iterations. For ||7?.||j^ > 1, each link 


has non-negligible effect on every other link in the system. 
Therefore the whole scheduling system is more than just the 
aggregates of distributed scheduling links. This demonstrates 
the emergence property of complex physical systems 

Another scenario is simulated where the packets arrive at 
the links according to Bernouli distribution of parameter i/ in 
the range showed in Fig. |4] The range carefully selected to 
keep it within the capacity region of the network eq. O. The 
links update their transmission parameters according to the Q 
with the learning rate a = 0.01. Also the initial transmission 
probability is 0.5 for all links. This simulation shows that that 
efficiency of distributed optimization of (|5]) greatly deteriorates 
for the interference graphs with high C — O(logn). Values of 
norm-1 ||7?.||]^ > 1 imply high coupling among distributed link 
optimizations where the whole system optimization cannot be 
decomposed to distributed link optimizations. This shows the 
emergence property of complex adaptive systems. That is the 
local observation of links under the distributed optimizations 
(s = (si)) are tied to each other to a level that the best strategy 
is a service-agnostic one as is predicted by Theorem 5. Note 
that unlike the graphs with high minimum vertex cover size of 
C — logn (or higher), the norm-1 of star configuration stays 
at almost zero. 

The third scenario in Fig. |5] is the same as the second one 
with the difference that the learning rate is selected to be 
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Fig. 3. Norm-1 of the dependencies matrix TZ for different interfer¬ 
ence graphs;A service/arrival rate agnostic approach. 

the time dependent according to a{t) = (i+to.a) iog(i^_tu.a) 
where t is the time of the current strategy update. This time 
dependent learning rate has been proven to avoid complexity 
of the system [3]. Another strategy update mechanism is to 
bind the transmission probability by where di is the degree 
of link i. However in the formulation of [3], the complexity 
avoidance concern (fast mixing condition in their context) 
is not part of the optimization problem and clearly not an 
optimal answer. For example in the case of a fc-regular graph, 
the transmission strategy of the links should be less than i 
however using the result of Fig. |3] it can be seen that for 
no value of probability transmission the complexity emerges 
(The norm of the dependencies matrix stays below 1 for all 
transmission probabilities). The complexity concern in our 
paper is an internal part of the optimization formulation. 

V. Conclusion 

Cognitive services in wireless networks have provided alter¬ 
native approaches for exploiting the existing resources. These 
services have been realized by providing the learning ability 
for the network elements to learn from the radio environments 
and their interactions with the rest of the network. These cog¬ 
nitive abilities shift the underlying models of communication 
system from complex physical systems to complex adaptive 
systems. This is because of the ability of cognitive nodes to 
interact with each other in a distributed way, where each node 
not only learn from the radio environment but also interact 
with other nodes. This increases the complexity of the wireless 
networks. 

The question we address here is how arrival rate, interfer¬ 
ence graph and the simple gradient methods for adjusting the 
transmission rates, affect the complexity of maximum through¬ 
put optimization? We answered this question by introducing a 
dependencies matrix into the maximal throughput optimization 
in the CSMA scheduling. Beside studying the complexity of 
the CSMA, a direct result of our work in Theorem 5 is to 



Fig. 4. Norm-1 of the dependencies matrix TZ for different expected 
arrival vector of the v : {vi = v),\/i and different interference graphs 
generated using the distributed optimization of © with constant a = 
0 . 01 . 



Fig. 5. Norm-1 of the dependencies matrix TZ for different expected 
arrival vector of the iz : {ui = v),\Ti and different interference graphs 
generated using the distributed optimization of © with time variant 


prove that when the minimum vertex cover of the interference 
graph is logarithmic in terms of the size of vertexes 0(log n), 
the suitable strategy for solving the optimization problem 
is to transmit with probability 1, i.e a service-rate agnostic 
approach. 

The CSMA scenario was simulated to derive the depen¬ 
dencies matrix and its connection with interference graph, 
transmission strategy and the arrival rate of the links. The 
result of Theorem 5 was confirmed using the result of our 
simulations. 
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Appendix A 
Proof of Theorem 3 

Proof: We prove that the dual of is Q. The La- 
grangian for (fT2l) is 

£(p,r,a,7) = -i/(p) + -p(X)a;* + Vi)+ 

V- ' "" V- (27) 

^ axpCS.) + 7(^p(X) - 1) 

X X 

where 7f(p) = — p(X) logp(X). In order to derive the 
XgO 

dual Lagrangian let’s take the first derivative with respect to 
p(X) for all states X. This yields: 


dC/dp{X) = logp(X) + 1 - X! - ax + 7 (28) 


Let p* = arginf C, then 

p 


p*(X) = exp (-7 - 1 + L! 

i 

This yields the dual Lagrangian function of 
/(r, OL, 7 ) = inf Cfp, r, a, 7 ) = 

P 

-^P*{V - 7 + 


(29) 


(30) 


To optimize the Lagrange dual function let’s take the derivative 
with respect to 7 that is 


df/dj = L] exp (-7 - 1 + L! + ax) - 1 


Setting (ISTT l yields 


that 


7 * = argsup/(r, a, 7 ) 
1 


exp(7* + 1) = L] exp(L] fiXi + ax) 


(31) 


(32) 


(33) 


then we have 

f{r,a) = f{r,a,-f*) = \nZ{r,a) 


(34) 


Since for all X, df /dax > 0, we have a* = 0 where a* = 
argsupo, /(r, a) Therefore we have 


/(r) = /(r,a*) = -InZ + 

i 

where 

Z{r) = Z{r, a*) = exp(L] x^n) 

X i 


(35) 
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